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4. 

A.  Spider  silks  have  become  widely  recognized  for  their  mechanical  properties, 
which  are  comparable  or  surpass  other  natural  fibers  and  even  many  manmade 
materials.  For  example,  some  silks  exhibit  reversible  stretch  greater  than  high 
elastic  nylon  (>200%)  while  others  have  a  tensile  strength  (>200,000  psi)  greater 
than  steel.  The  molecular  basis  for  how  spider  silks  achieve  such  feats  was  the 
focus  of  our  research.  In  particular,  we  1)  cloned  and  sequenced  cDNA  for  the 
previously  uncharacterized  flageUiform  silk;  2)  determined  the  genomic 
organization  of  the  flagelliform  silk  gene;  3)  developed  hypotheses  relating  silk 
sequence  motifs  to  specific  mechanical  properties;  and  4)  designed  a  strategy  to 
study  the  expression  and  fiber  formation  of  flagelliform  silk  protein. 

B.  (1)  Characterization  of  Flagelliform  Silk  cDNA 

We  accomplished  the  primary  goal  of  cloning  the  gene  for  flagelliform  ■«=ii1k 
(Hayashi  &  Lewis,  1998),  the  stretchiest  of  all  known  silks.  Flagelliform  .giilk  forms 
part  of  capture  spiral  of  an  orb-web  and  has  a  lower  tensile  strength  (IxlO^  Nm-2) 
but  several  times  the  extensibility  (>200%)  of  dragline  silk  (VoUrath  &  Edmonds, 
1989;  Kohler  &  VoUrath,  1995).  A  functioning  capture  spiral  is  a  composite  of 
secretions  from  the  aggregate  and  flageUiform  sUk  glands.  We  focused  on 
flageUiform  silk  because  it  forms  the  actual  fiber  of  the  spiral  whUe  aggregate  silk 
is  laid  down  as  a  non-fibrous,  aqueous  coating  of  sticky  droplets. 

The  mechanical  properties  of  flageUiform  silk  correspond  to  the  capture 
spiral's  ecological  function.  A  spider's  orb-web  has  to  immediately  stop  a  rapidly 
fljdng  insect  in  a  manner  that  aUows  the  prey  to  become  entangled  and  trapped.  To 
do  this  the  web  must  absorb  the  energy  of  the  insect  without  breaking  and  yet  not 
act  as  a  trampoUne  to  bounce  the  insect  away  fi*om  the  web.  Without  the  bigli 
elasticity  of  the  capture  spiral,  an  orb-web  could  not  be  as  effective  an  aerial  net. 
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To  clone  the  flagelliform  silk  protein  gene,  we  constructed  a  cDNA  library 
from  mRNA  expressed  in  the  flagelliform  silk  gland  of  Nephila  clavipes.  From  this 
library  we  generated  complete  sequences  for  several  cDNA  clones.  These  clones 
span  ~6000  basepairs  of  the  flagelliform  gene  and  include  a  putative  secretory 
signal,  the  carboxy-terminal  end,  and  substantial  portions  of  intervening  sequence 
(see  Genbank  accession  numbers  AF027972  and  AF027973).  When  this  large 
amoxmt  of  sequence  data  is  translated,  it  becomes  clear  that  the  flagelliform  protein 
can  be  summarized  by  three  repeating  amino  acid  repeats:  Gly-Pro-Gly-Gly-X;  Gly- 
Gly-X;  and  a  non- glycine  rich  "spacer."  This  protein  surprisingly  lacks  the  poly- Ala 
regions  that  are  prevalent  in  Bombyx  (the  silkworm)  silk  and  the  other 
characterized  spider  silks. 

The  Gly-Pro-Gly-Gly-X  motif  is  the  dominant  repeat.  The  first  four  amino 
acids  of  this  unit  are  highly  conserved  among  the  numerous  individual  repeats. 

The  fifth  amino  acid,  indicated  by  X,  is  variable.  However,  only  a  small  subset  of 
residues  (Ala,  Ser,  Tyr,  and  Val)  occupies  90%  of  those  positions.  Moreover,  the 
distribution  of  the  four  residues  among  the  repeats  is  non-random.  Repeat  units 
with  X=Ala  tend  to  be  followed  by  other  iinits  with  X=Ala.  This  pattern  creates  an 
array  of  exact  Gly-Pro-Gly-Gly-Ala  repeats.  A  second  type  of  array  has  repeats  with 
X=Tyr  alternating  with  repeats  with  X=Ser  or  Val.  Flagelliform  is  the  only  silk 
protein  known  to  have  these  types  of  patterned  higher  level  variation. 

The  second  glycine-rich  motif,  Gly-Gly-X,  occurs  approximately  tenfold  fewer 
times  than  Gly-Pro-Gly-Gly-X.  Similar  to  the  proline  containing  motif,  the  X  residue 
is  predominantly  Ala  or  Ser.  Gly-Gly-X  is  present  between  the  Gly-Pro-Gly-Gly-X 
arrays  and  the  spacer  and  may  serve  as  a  transition  between  them. 

The  third  repetitive  element  is  both  the  longest  and  least  common  motif. 
These  regions  are  termed  “spacers”  because  they  disrupt  the  glycine  and  proline 
rich  flagelliform  sequence.  Though  at  28  amino  acids  in  length,  each  spacer  is 
substantially  longer  than  a  Gly-Pro-Gly-Gly-X  or  Gly-Gly-X  motif,  the  spacers  are 
extremely  highly  conserved.  The  individual  spacers  differ  by  only  one  or  two 
residues  from  each  other. 
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Not  only  is  the  flagelliform  sequence  composed  of  individual  repetitive  motifs, 
but  the  motifs  themselves  are  organized  into  larger  ensemble  repeats.  The  Gly-Pro- 
Gly-Gly-X  motifs  are  arrayed  in  tandem  forty-three  to  sixty-three  times.  These 
repeats  are  followed  by  six  to  twelve  Gly-Gly-X  motifs,  a  "spacer"  region,  and  finally 
a  single  Gly-Gly-X.  Then,  the  ensemble  repeat  begins  again  with  Gly-Pro-Gly-Gly- 
X. 

Repetitive  Units  Within  the  Flagelliform  Silk  Protein 


BVDIVroUAL  MOTIFS 
U  Gly-Pro-Gly-GIy-X 
[]  Gly-Gly-X 

spacer 


organized  into  ENSEMBLE  REPEATS 


We  propose  that  the  three  repetitive  motifs  and  their  occurrence  in  ensemble 
repeats  reveals  the  structural  basis  for  elasticity.  The  dominant  motif  of  this 
protein,  Gly-Pro-Gly-Gly-X,  appears  up  to  63  times  in  tandem  arrays.  This  motif 
likely  forms  Pro^-Gly®  type  II  p-turns  with  the  resulting  series  of  concatenated  P- 
turns  forming  a  helix  (termed  a  P-spiral:  TWT).  This  p-spiral,  much  like  a  spring, 
can  be  stretched  and  upon  release  of  tension,  recoils  back  to  its  original  length. 

This  simple  model  provides  for  elasticity  at  the  level  of  individual  P-spirals. 
However,  the  existence  of  distinct  Gly-Pro-Gly-Gly-X  neighborhoods  (i.e.  X=Ala 
arrays  or  X=Tyr  alternating  with  X=SerA^ al  arrays)  suggests  that  interactions 
between  P-spirals  also  have  significance.  Research  on  elastins  (Urry  et  al.  1995), 
glutens  (Van  Dijk  et  al.,  1997),  and  the  bacterial  virulence  factor  P.69  pertactin 
(Emsley  et  al.,  1996)  has  shown  that  bonds  can  form  between  adjacent  P-spirals.  As 
the  flagelliform  P-spirals  associate,  the  protein  monomers  can  align  and  assemble 
into  silk  fibers. 

The  spacer  regions  may  also  be  involved  in  the  assembly  of  silk  fibers.  While 
the  precise  structure  and  function  of  the  spacers  remains  unknown,  their 
remarkably  high  sequence  conservation  and  possession  of  the  only  negatively 
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charged  residues  (Asp,  Glu)  among  the  repetitive  units  suggests  that  it  is  a  critical 
component  of  the  silk.  We  propose  that  the  spacers  create  distinct  regions  where 
the  flagelliform  proteins  can  overlap  and  aHgn  with  each  other,  resulting  in  a  fiber 
of  woven  monomers.  The  negatively  charged  residues  of  the  spacers  may  also 
promote  post-assembly  interactions  with  the  coating  of  aqueous  aggregate  gland 
silk. 

(2) .  Genomic  Organization  of  the  Flagelliform  Silk  Gene 

In  addition  to  cloning  cDNAs  for  the  flagelliform  protein,  we  also  isolated  a 
large  portion  of  the  silk  gene  firom  Nephila  clavipes  genomic  DNA  through  a 
combination  of  screening  a  X  Hbrary  and  PGR  amphfication  (Hayashi  &  Lewis,  in 
prep.).  We  completely  sequenced  the  -17.6  kb  of  DNA  fi'agments  and  confirmed  the 
cDNA  results.  More  importantly,  the  genomic  sequences  reveal  that  flagelliform 
has  a  unique  gene  organization.  The  gene  is  evenly  divided  between  coding  and 
non-coding  sequence.  Aside  firom  a  small  exon  of  non-repetitive  5'  sequence,  aU  the 
exons  encode  a  single  ensemble  repeat  (the  higher  level  repeat  unit  made  up  of  Gly- 
Pro-Gly-Gly-X,  Gly-Gly-X,  and  spacer  subunits).  Even  more  surprising  was  the 
discovery  that  the  introns  between  these  iterative  exons  also  share  an  extremely 
high  level  of  identity.  Thus,  the  introns  and  exons  themselves  are  an  even  higher 
level  of  repeating  unit  within  the  flageUiform  gene.  This  gene  organization  provides 
strong  evidence  that  concerted  evolution  is  involved  in  the  maintenance  and 
diversification  of  flagelliform  silk. 

(3) .  Hypotheses  Relating  Silk  Amino  Acid  Motifs  to  Mechanical  Properties 

One  of  the  attractive  features  of  studying  spider  silks  is  that  unlike  other 
silk-producing  organisms,  spiders  produce  multiple  types  of  silks  during  all  stages 
of  their  lifetime.  Because  each  silk  is  used  for  a  specific  ecological  function,  the 
different  silks  have  their  own  distinctive  combinations  of  mechanical  properties. 
This  naturally  occurring  diversity  allows  for  comparative  studies  among  closely 
related  proteins.  Thus  we  compared  the  new  flagelliform  data  to  the  available 
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sequence  and  structural  studies  done  for  other  spider  silks  (Hayashi,  Shipley,  & 
Lewis,  submitted).  We  concentrated  primarily  on  correlating  amino  acid  motifs 
with  two  mechanical  features:  elasticity  and  tensile  strength.  The  hypotheses  that 
were  generated  have  significance  for  the  design  of  synthetic  silks  with  properties 
precisely  engineered  for  specific  applications. 

The  three  best  known  silks  differ  in  their  elasticity  and  strength.  Major 
ampuUate  (dragline)  silk  has  moderate  elasticity  and  high  tensile  strength  while 
minor  ampullate  (web  reinforcement)  silk  has  a  lower  tensile  strength  and  lacks 
elasticity.  Flagelliform  silk  also  has  a  lower  tensile  strength  than  major  ampullate 
silk  but  has  much  higher  elasticity.  The  sequencing  of  cDNAs  for  these  silks  has 
established  that  they  are  composed  almost  entirely  of  repetitive  elements.  By 
comparing  the  consensus  repeats  from  the  different  silks,  we  have  shown  that  there 
are  four  types  of  amino  acid  motifs  shared  by  all  known  spider  silks:  1)  poly- 
Ala/poly-Gly-Ala;  2)  GGX;  3)  GPGGX/GPGQQ;  and  4)  "spacers."  All  of  these 
elements  were  also  found  orthologous  cDNAs  from  Araneus  (Guerette  et  ah,  1996). 
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abbreviations:  Flag=flagellifonn  protein;  MaSpl,  MaSp2,  ADF-3,  ADF-4=major  ampullate 
proteins;  MiSpl,  MiSp2,  ADF-l=minor  ampullate  proteins;  ADF-2=putative  tubuliform  protein 
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Evidence  from  biophysical  studies  on  silks  suggest  that  specific  secondary 
structures  are  related  to  some  of  these  shared  motifs.  Fiber  X-ray  diffraction  and 
NMR  data  have  been  most  useful  with  substantiating  the  presence  of  poly- Ala  and 
poly-Gly-Ala  regions  as  P-sheet  (Simmons  et  al.,  1994;  Kiimmerlen  et  al.,  1996; 
Simmons  et  al.,  1996;  Parkhe  et  al.  1997).  These  regions  could  serve  as  the  Hnkage 
points  for  the  crystalHne  areas  in  the  fiber.  Presumably,  these  are  the  parts  of  the 
protein  that  bind  the  monomers  together  in  the  fiber  and  provide  tensile  strength. 
These  P-sheets  can  be  depicted  with  poly- Ala  forming  a  structure  with  successive 
alanine  residues  placed  on  alternate  sides  of  a  backbone.  Each  chain  can  then 
interlock  with  an  adjacent  chain.  This  configuration  provides  additional 
hydrophobic  binding  energy  as  the  P-sheet  regions  are  known  to  be  poorly  hydrated 
(Matsuno  &  Lewis,  unpub.  data). 

The  poly-Gly-Ala  regions  can  form  a  structure  similar  to  poly- Ala.  The  key 
feature  of  the  poly-Gly-Ala  configuration  is  that  all  the  glycines  are  on  one  side  of 
the  backbone  and  all  the  alanines  are  on  the  other  side.  Because  the  glycine  side  of 
the  polypeptide  chain  is  unable  to  have  the  same  number  of  hydrophobic 
interactions  possible  with  poly-Ala,  the  poly-Gly-Ala  regions  have  a  lower  binding 
energy  than  poly- Ala  p-sheets.  This  model  is  in  agreement  with  the  lower  tensile 
strength  of  minor  ampuUate  silk  (with  poly-Gly-Ala)  relative  to  major  ampuUate 
silk  (with  poly- Ala).  Thus,  the  strength  of  interactions  between  the  P-sheet  regions 
of  these  proteins  predicts  the  tensile  strength  of  each  silk. 

The  second  shared  motif  is  Gly-Gly-X.  These  repeat  regions  have  been 
proposed  to  form  either  a  P-sheet  (Thiel  et  al.,  1994)  or  a  helix.  We  prefer  the 
helical  motif  because  it  is  supported  by  both  FTIR  and  NMR  data  (Dong  et  al.,  1991; 
Kiimmerlen  et  al.,  1996).  A  tight  3 lo helix  is  consistent  with  GGX  being  three 
residues  in  length.  Such  helices  could  serve  as  a  transition  or  hnk  between 
crystalline  p-sheet  regions  and  less  rigid  protein  structxnes.  Also,  neighboring  GGX 
helices  may  interact  to  maintain  alignment  among  adjacent  protein  molecules  in 
the  fiber. 
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The  GPGXX  pentapeptide  repeat,  the  third  shared  motif,  has  been  suggested 
to  conform  to  a  structvire  similar  to  the  p-turn  spiral  of  elastin  (Urry  et  al.,  1975; 
Chang  et  al,  1989;  Urry  et  al.,  1995).  Our  particular  model  for  the  p-spiral  formed 
by  the  GPGGSGPGGY  segment  of  flagelliform  silk  is  shown  at 
right.  Two  features  are  notable  in  this  model.  The  first  is  the 
similarity  of  this  structure  to  a  spring  that  could  easily  serve  as 
the  elastic  mechanism  in  the  fiber.  The  prohne  residue  would  be 
the  focal  point  for  the  retraction  energy  after  stretching. 

By  forcing  the  proline  bonds  to  torque  in  response  to  extension,  a  large  force  can  be 
generated  for  retraction.  The  second  key  feature  of  the  model  is  the  positioning  of 
the  hydroxyls  in  Ser  and  T3rr  for  hydrogen  bonding  with  downstream  Gly  residues. 
Notice  that  the  long  T)^:  sidechains  stabilize  the  tight  P-turns  and  the  shorter  Ser 
sidechains  stabihze  the  layers  of  coils.  The  importance  of  these  bonds  is  supported 
by  the  strong  tendency  for  Tyr  and  Ser  residues  to  regularly  alternate  in 
flagelliform  silk. 

Only  the  major  ampullate  and  flagelliform  silks  contain  the  GPGXX  motif 
and  they  are  also  the  stretchiest  of  spider  silks.  As  further  support  of  the  GPGXX 
motif  providing  the  elasticity  module,  there  is  a  correspondence  between  the 
number  of  tandemly  arrayed  GPGXX  repeats  and  the  different  extensibilities  of  the 
two  silks.  Nephila  major  ampullate  silk,  with  up  to  35%  extension  has  at  most  nine 
P-tvims  in  a  row  before  interruption  by  another  motif  (Hinmam  &  Lewis,  1992). 
Flagelliform  silk  with  200%  extensibility  has  a  minimum  of  43  contiguously  linked 
P-turns  in  its  spring-hke  spirals  (Hayashi  &  Lewis,  1998).  Thus,  the  longer  the 
molecular  spring,  the  greater  the  elasticity  of  the  silk. 

The  fourth  shared  motif,  the  "spacers,"  are  the  most  complex  repeats  in  silks. 
The  structures  formed  by  these  regions  remain  unknown.  Though  the  minor 
ampullate  (Colgin  &  Lewis,  1998)  and  flagelliform  (Hayashi  &  Lewis,  1998)  spacers 
differ  radically  in  amino  acid  sequence,  they  are  both  relatively  long  and  contain 
charged  residues.  Possible  roles  for  the  spacers  include:  1)  pre-fiber—promotion  of 
an  alternative  structure  for  the  silk  while  it  is  stored  in  liquid  form  to  prevent 
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premature  fiber  formation  within  the  silk  glands;  2)  within  fiber- -alignment  of 
crystalline  or  other  structural  regions  among  individual  protein  molecules;  and  3) 
extra-fiber  "provision  of  surface  regions  that  interact  with  other  critical  components 
of  silk  (e.g.,  the  association  of  flagelliform  silk  with  sticky  aggregate  gland 
secretions  that  are  important  for  trapping  prey). 

Based  on  the  associations  presented  above  relating  amino  acid  motifs  to 
secondary  structures  and  secondary  structures  to  functions  in  the  silk  fiber,  we 
view  spider  silks  as  sets  of  modules.  These  modifies,  such  as  the  crystalline  modifie 
poly- Ala,  provide  specific  properties  to  a  sfik  fiber.  Not  only  is  the  presence  of  a 
particifiar  modifie  important,  but  the  frequency  of  modifies  is  critical.  Thus,  the 
greater  extensibility  of  flagelliform  silk  compared  to  major  ampullate  sfik  can  be 
attributed  to  flagelliform  having  more  repeats  of  the  elasticity  module  GPGXX. 
Similarly,  the  greater  tensile  strength  of  major  ampullate  sfik  compared  to  minor 
ampullate  silk  is  due  to  the  large  poly- Ala  component  in  major  ampullate  silk  and 
the  presence  of  the  weaker  poly-Gly-Ala  crystalline  regions  in  minor  ampuUate  silk. 

These  modular  hypotheses  both  explain  the  properties  of  the  known  silks  and 
predict  the  properties  of  silks  yet  to  be  characterized.  It  may  be  that  spider  silks 
have  evolved  through  the  modification  and  shuffling  of  a  small  number  of  amino 
acid  motifs. 

(4).  Recombinant  Protein  and  Expression  Studies 

Based  on  our  previous  success  with  artificial  gene  construction  using 
nonregenerable  restriction  sites  (Lewis  et  al.,  1996),  we  designed  DNA  cassettes  to 
code  for  (GPGGYGPGGS)2  and  (GPGGA)4.  These  can  be  combined  to  generate 
sequences  which  are  similar  to  the  native  protein.  We  can  also  construct  GGX 
repeats  and  the  spacer  sequences  as  well.  Initial  efforts  are  directed  toward  the 
first  two  cassettes  to  aid  in  determining  the  structure  of  the  proteins  in  the  fiber. 
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